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Recent research by Valette, Oiler, and others has 
rhown the utility of dictation as a measure of general language 
coriP^-^-nc^ when correlated with achievement and pro f i ci ency ba 1 1 e ri es 
f-^r Fr-^nch and J^nglish as a second language. No such studies have 
he^a conducted with Spanish. The investigator hypothesized that since 
Spanish is a phonetic lanqu age perm itti ng easy transcription witiiout 
coTipr ^^h^nsion, the dictation would not serve as a good substitute 
m-a^ar? of language competence. Tn order to test this hypothesis 127 
V'tU'iea^-s enrolled in first-year Spanisii at the University of Colorado 
woj:^ r^-ad a 105-word dictation together with a 100-item final 
xami p.a*: ion. The results of each test rfere then co^^elated by 
-TOnDU^-r anf^ a Pearson product-moment coefficient of .50 was 
obtained. The scores on both tests for all students are displayed on 
n scat*:er diagram, and the reliability of both tests was ascertained 
•isinq t h^ ^Kuder-Pichardson 21 formula. T^he results indicate general 
confirmation of the research hypothesis that the dictation is 
'^.igii f icant ly less useful as a profiriency measure for Spanish than 
for Pr^nch and English. (Author) 
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fc- "Dictation as a Measure of Spanish Language Proficiency"* 

tOU.'ATlON ^T'- » ON0O POL'C^ 

j The dictation is one of the oldest techniques known for testing progress 

^->-) i, foreign language learning. It has long been associated with the traditional 



^ i1 or grammar translation method and for this reason was rejected by Gouiji and others 

o-^ i 

N> ^1 who propagated the "natural" method during the second half of the nineteenth 
' — ' p 

CD ilj century. Later it became popular again under the direct method, especially as 
sjU i 

^ phonetic dictation or phonetic transcription of spoken language, a task vs^ch was 

I 

p especially pleasing to direct methodologists because of their scientific interest 

i . 

^ in phonology. The reading method, which was popular during the 1930 's and 1940 's^ 

1 ' ■ 

■^S employed the dictation only sparingly since this method's emphasis on listening 

,1 

^. ocmprehension and spelling was slight. 

' if 1 

^■^ With the advent of the aixiio-lingual method at the beginning of the 

1960 's, dictation again received considerable criticism partly due to its 
association with the writing skill, v^ch was to be postponed, and partly because 
of it's association with the grarrmar translation method vfcLch became the v^iipping 
boy of new-key methodologists. Therefore, in spite of substantial si:pport for 
research in foreign language learning during the decade following N.D.E.A., no 
research or interest in the dictation was derons'trated, but for a single exception. 

a) 

ry In 1964, Valette reported on a study she conducted at the IMiversity of 



South Florida. During a first semester French course, she divided six beginning 
French classes into tvvo treatment groUps. Group A received regular dictations 
during each class meeting throughout the seines ter. Group B received only sporadic 
dictations for a total of only three or four during the semester. At the close 



*The follcwing paper was presented at the SCTiin:;.r on Tests and Testing held at the 
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of the course, all sections received the same departmental examination which 
included a 50 'word dictation. The maximum possible raw score on the examination 
was 155, while maximum raw score on the suhpart was 20. After calculating a 
Pearson product-n>cment coefficient for both groups betv/een pari: score on the 
dictation and total score on the test, Valet te found corralations of .78 for Group 
A and .89 for Group B. She concluded that for French at least the dictation is a 
reasonably good measure of overall student proficiency, especially when practice 
in taking dictation has not been offered. 

Ironically, Valletta's finding went relatively unnoticed for the 
reanainder of the 1960's. Interest in the dictation returned again in the 1970' s 
due to the extensive research into integrated measures of language proficiency 
spearheaded by John Oiler. After r^x^rting on the high correlations obtained 
between short cloze tests and multi -section' proficiency tests. Oiler and others 
turnied their attention to the dictation. In 1971, Oiler reported' a correlation 
of .86 between a section on dictation and total score on the UCLA English as a 
Second Language Placement Examination - Four years later in response to criticism 
of his figures by Breitenstein^ Oiler and Streiff published a corrected correlation 
coefficient of .94. ■ 

In more recent article on testing E.S.L. university students in 
Ircin, Ir\dne, Atar, and Oiler reported similar findings, although of lesser 
magnitude, after correlating scores on a cloze test and a dictation with scores 
on the Test of English as a Foreign Language published by Educational Testing Service. 

Thus far research on the dictation has focused on two languages, 
English and French. The results of this research have greatly sinplifled the 
task of proficiency and placement testing in these languages and rejuvenated 
confidence in the use of dictation by E.S.L. and French teachers. Oiler and >^ 
others have posited that the success of the dictation is due to the fact that' it \ 
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itself is an integrated measure of language ccmpetency, testing many factors 
such as sound discrimination, wrd recognition, rapid decxx3ing of speech, infonna- 
tion storage in short term meinory, recoding and spelling. In several interesting 
articles , he refers to the cognitive model of the active listener viho constructs 
in his mind what the speaker is saying or will say, and then ccnpares this 
expected model with what he actually hears. The ability to accurately and rapidly 
construct this model which Oiler calls a grammar of expectancies, and then 
accurately ccnpare it with the perceived stream of speech, is viewed as the 
active application of the listener's underlying linguistic carpetence in the 
language. . 

Yet it remains to be seen whether the activity is as useful in 
languages such as Spanish and German \^Mch show much sirrpler phonological and 
orthographic systems thereby allowing the learner to merely transcribe what he 
hears. In order to ascertain vs^ether the dictation correlates highly with 
ccnprehensive language skills test scores in, Spanish, and can therefore serve as a 
good "quick and dirty" Spanish proficiency measure, the researcher conducted the 
following experiment. 

Methodology . All one hundred twenty seven (127) students enrolled in the second 
sejnester of a first year Spanish course at the University of Colorado were 
selected. as subjects. Undergraduate students at the University of Colorado are 
generally above average in intelligence and show mean scores of about 550 on the 
verbal arid 570 on the quantitative sections of the College Entrance Examination 
■Board ■Lg_ Scholas.tic- Aptitude Test . The subjects were given a 100 iton final 
written examination three houris in length. In addition, approximately one hour 
after f-^^^ test began, students were administered a 106 word dictation constructed 
by the investigator and a graduate student. When both tests were graded they 
were turneO over to the researcher for statistical analysis . : , 

Since the purpose of this study is to determine the suitability of the 
dictation as a substitute measure of achievement or proficiency, .the two sets of 
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data, vjero correlated using the Pearson product-mcment correlation co- 

ef f i.cient (r ) - This statistical technique is appropriate when we 

wish to correlate .scores on two interval scales. The following decision 

rules were made regarding the interpretation of the data. Since a 

high correlation coefficient is necessary to derronstrrate the -empirical 

validity of an instrument , and since the N was large enough to give . 
J' 

substantial pcMer to the statistical technique, it was decided that 
the null hypothesis, p^ _ 0, would riot be rejected unless the 
prob^ibility of the obtained r was less than one perc*ent (p< .01)- 
FurthenTore, since correlations based on a' single sairple are subj ct 
to sampling error, it was decided to construct a 95% confidence 
interval around the obtained correlation coefficicant using Fisher's 
Z-transformation of r as described by Glass and Stanley. Such a 
confidence interv^al offers considerable certainty of capturing the 
true correlation between the tv^30 instruments while piToviding a truer 
picture of the generalizeabiliby of the findings. 

The data were then analyzed as described above a CDC 6400 
ocnputer enplqying the standard statistical programs iiicluded in the 
Statistical Package for the : Social Sciences ■ 

Calculation of the reliability of a teacher scored test is often 
tedious f particularly wHen the test contains a large nuirber of items 
and is given to a large number of students. In order for the normal 
point-biserial.. coefficient" to be calculated, it would be necessary to 
tally correct and incorrect responses to every item on the two tests. 
This v^ould involve the collection of scn\e 27,000 pieces of data. 

Fortunately, there is a much simpler procedure available to 
researchers kncv/n as Kuder-Richardson Formula 21. KR 21 requires 



Page 5 

knavledge of only tlie test mean^ the standard deviation^ and the number 
of iteiTvs on the test, ard these statistics are readily available 
through corrputer axialysis of scores. KR 21, hcv/ever, is only an lastiinate 
of true reliability. In analyzing 58 tests^ Lord found that KR 21 
consistently underestimated the true reliability, though usually hy 
.05 or less. Because of this, KR 21 is often used as a lower-bound or 
minimal estimate of reliability (Stanley and Hopkins, 1972, p. 127). 
In this study, KR 21 reliability coefficients were calculated by the 
investigator on a hand calculator. 

• Instrumentation . The final exPXidnation was an achieveraent test based 
on the content of the textbook Espanol a lo vivo by Hanseii and Wilkens. 
It was graded by each student's regular instructor based on a pifeviously 
agreed upon syston of scoring. Since sane sections required the student 
to write several words or a sentence, scnie ierrcrs counted only one-half 
point. Ihe examination consisted of sentence rewrite exercise^ using 
various syntactical transformations, and f±ll-in'-the-blank exercises for 
testing morphology. It wa.'^ a totally discrete point test and was given 
during final examination week in May, 1976. The test was designed to 
be cumulative in nature and covered the content of tl'^ entire text. 
It was not merely limited to the second half of the book. Because of 
this, the test can be considered a good indi.cator of the beginning 
student's grammatical corpetence. 

Ihe dictation was likewise based on the vocabulary and structures 
enc.X)untered by students taking first year Spanish with the Hansen and 
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Wilkins text. Desiqried to be samewhc^t chc'illenging in order to obtain 
a reliable spread of scores , it was adirinistered in the rionnal manner 
■ •.. as described by Valette cmd otliers, aiid lasted about twelve minutes. 
Students first listened to tlie entire jDaraqraph for meaning. Ihen e^ach- 
breatl"! group of five to eight words was read twice by a graduate student 
fran riexico. Finally, . tl"ie entire i:;election was reread at norri\al speed. 
Following the cli elation, stude^ats continuec] work on the examination. 
Since one test v.viXyiv'en in the -lai.dcUe of the otliex., history and 
maturation can disrcvjarded as [XDSsiule threats to the internal 
validity of this stvidy. 

The dictations were graded by'^a graduate student 
under the dir^ :tion of the investigator based on a system 
which counted one point off for each word which was in-^- 
correctly written; in any way or which should not* -have 
appeared. No parti a] credit was given since previous 
research on the cloze test has shown no change in the '\. 
respective ranks of sub'jects when more- elaborate scoring - 
systems are used. Oiler (1975) has also employed this 
same procedure in his research on the dictation. 

Analysis of^ Results : Table 1. depicts some descriptive 
statistics for both tests. The correlation between the two 

. indices was .495 which is significant at the .001 level. 
This again confirms. the validity of the dictation as a 

- proficiency measure. 

By employing Fisher's Z-i:ransf orma tion of r, wc can 

' 7- 
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produce a confidonco interveii on the true correlation 
between the two indices for the population, by a process 
which captures p within its limits 95% of the time. The 
resulting intervcil is .36 - .72- This means that we can be 
reasonably certain that the true correlation produced by - 
taking an infinite number of samples is greater than .36 
and less than .72. 

tablh: I 
1) K s c R I p T I V r: tat i s t i cs 



Final Exam 


Score 


Mean - 


---6 8.9 


Std 0ev 


IC . 6 


Std Err 


1.4 


High 


94.5 


Low 


12.5' 


Range 


82.0 


^KR21 


. 93 



Errors on Dictation 



r --.495 

xy 

p < . 001 



Mean 
Std Dev 
Std Err 
High 
Low 
Range 

^KR21 



17.9 
7.2 
.6 

40,0 
6.0 

34.0 

.72 



Note: The score on the final exam was determined by counting 
the number of right answers. The score on the dictation • 
was determined by counting the number of errors. The 
result is a negative correlation coefficient . A positive 
correlation of equal magnitude would be derived by scoring 
according^ to the number of right answers or wrong answers 
on both tests. . ^ 

Tho reliability both tests is qood , particularly when one 

considers that the coefficients reported here, . 72 for the 

dictation and^ . 93 for the final exam, are minimal or lower 

bound estimates. The reliability of the final exam indicates 

that the test functioned as an effective discriminator 

between different levels of knowledge ^.ong first year 
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Because the relationship between two vatiables will be 
weakened by any lack of re liabili ty in the measurement of 
eitner or both variables, statisticians have developed a 
technique for depriving the true correlation. This procedure, 
called the correction for attenuation, estimates what the 
correlation between two variables" would be if both tests 
were perfectly reliable . The formula • for deriving an 
estirr.ate of the "true" relationship is 

X y r*" 

^/ >^ y 

where. ^t^t = the correlation between true scores on 

^ ^ variables x and y. 

r = the obtained correlation between ■ variables 
X and y, and 

r r = the reliability coefficients of variables 
^ ^ - X and y , respectively. 

By substituting the obtained coefficients into the 
above formula, we get: ^ 

""t^t^ =J{.93T'{.72) - .61 ' : . ' 

This procedure further corroborates the confidence 
interval (.36 - .72) which was developed earlier. Again 
it appears that even if both tests had been harder, resulting 
in a greater dispersion of scores and differention among 
students, the resulting correlation between test scores 
would still be moderate, rather than high. 

Since correlations may be linear, curvilinear, or 
•rar.dom, it is best to depict tliem on a scattergram (or 

9 
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(ACROSS) DICIADQ ERRORS ON DICTATION 
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' scatter diagram) . A visual understanding of the strength 
of a relationship can be gained by studying a two--way 
scattergram of tallies. Each asterisk represents the 
intersection of two scores for a single subject. If tv)v^ 
or more subjects fall into the "same position on "tTt^^catter 
gram, the actual number of subjects is printed. 

The relationship between two indices is linear if an 
imagined straight linto thi^ough the center of the tallies, 
called a regression line, more closely fits the pattern of 
the scattergram tlian does any curved line. The scatter- 
gram reproduced here indicates a definite linear relation- 
ship between the two indices, so that the score on the 
final decreases, errors on the dictation increase. 
Nevertheless, the relationship depicted is far from 
perfect as one can readily perceive many scores which do 
not fit the regression line closely. in such cases, the 
score -on one test will not serve as a predictor of the 
score on another since the difference between the predicted 
score and the actual obtained score is considerable. It 
is on these differences, otherwise known as errors in 
prediction, that the correlation coefficient is based. 
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Conclusions and Discu s sion > This study compai^^^ ^Cores 
a dictation with scores on a 100 item Spanish ^^^^eveinen''^ 
test considered to be a valid i ndicator of oy/e^^^-^ 
grammatical competence. it found the dictati^;^ be on 
a moderately good indicator of overall conipet^f^^ When u^^^ 
with learners of Spanish* It does not pu^rpoj^^; contra^'''^^ 
the findings of other researchers who have fou^^ higher 
correl at ions for learners of English and Prenc;'^' Indeed / 
ae the investigator hypothesi zed before con^^^'^^g the d^^^'' 
Spanish can be transcribed by the language l^a^^^^ with 
considerable facility, and^ this reduces the de^^^'^^ncy oH 
the internal i zed graminar of expectancies pos it^^ oHe^ ' 
V7hile the Spanish learner will employ his inte^^^^ized 
grammar of the language in taking a dictation is not 

•left to, depend on it alone. If he does not re^^^^^Ze a 
word he may still transcribe it correctly cj^^ good 
fit of. the language. He is not forced to coHs^^^^^ a 
distorted version of what the "dictator" h^is s^^^ (though 
he sometimes does) through the active. use of j^j,^ "'^nterna 1^^^^ 
grammar • . He cen instead rely on simple spi^Hj^j^^ ^Onventi^^^ 
to f iri in the gaps when his linguistic conipet^^^^ fails 
him. Oiler (1976, p. 77) has stated, "Low int^^^^^relat i^^^ 
must be interpreted as indicating low test vali^'^^V , i e - / 
that one of _'the' tests being correlated ^oe^ not ^^5^ under'^ 
lying linguistic competence or that it does so 



Page 12 

insufficient extent." It is my belief,- supported by the 
findings described here, that the dictation does not 
sufficiently tap the learners underlying competence so that 
the learner must depend on that competence exclusively in 
ordtir to perform correctly in Spanish. On the other hand, 
the validity of the close test as a proficiency measure 
would be generalizeable to Spanish because in constructing 
an appropriate response the learner is depending exclusively 
on clues provided him by his internalized grammar. If such 
is the case, v/e can expect a large disparity in correlations 
on close tests and dictations in Spanish. It is probable that 
future research applying integrated measures to languages 
with good fit will demonstrate this. 



Qiarles Stansfield 
University of Colorado 
Boulder, Colorado 
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